Picture proofs 



Do you ever walk through a proof, understand each step, yet not believe the 
theorem, not say 'Yes, of course it's true'? The analytic, logical, sequential 
approach often does not convince one as well as does a carefully crafted 
picture. This difference is no coincidence. The analytic, sequential portions 
of our brain evolved with our capacity for language, which is perhaps 10 5 
years old. Our pictorial, Gestalt hardware results from millions of years 
of evolution of the visual system and cortex. In comparison to our visual 
hardware, our symbolic, sequential hardware is an ill-developed latecomer. 
Advertisers know that words alone do not convince you to waste money on 
their clients' junk, so they spend zillions on images. This principle, which has 
higher applications, is the theme of this chapter. 

4.1 Adding odd numbers 

Here again is the sum from Section 2.1 that illustrated using extreme cases 
to find fencepost errors: 

S= 1 + 3 + 5+--- 

s v ' 

n terms 

Before I show the promised picture proof, let's go through the standard 
method, proof by induction, to compare it later to the picture proof. An 
induction proof has three pieces: 

1. Verify the base case n = 1. With n = 1 terms, the sum is S = 1, which 
equals n 2 . QED (Latin for 'quite easily done'). 

2. Assume the induction hypothesis. Assume that the sum holds for n terms: 
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]T(2fc-l) = n 2 . 



This assumption is needed for the next step of verifying the sum for n+1 
terms. 

3. Do the induction step of verifying the sum for n+ 1 terms, which requires 
showing that 

n+1 



]T(2fc-l) = (n+l) 2 



The sum splits into a new term and the old sum: 

n+1 n 



^(2fc-l)= 2nj_L +^(2fc-l). 
1 new term 1 

The sum on the right is n 2 courtesy of the induction hypothesis. So 

n+1 

J2(2k - 1) = 2n + 1 + n 2 = (n + l) 2 . 
l 

The three parts of the induction proof are complete, and the theorem is 
proved. However, the parts may leave you feeling that you follow each step 
but do not see why the theorem is true. 

Compare it against the picture proof. Each term in the 
sum S adds one odd number represented as the area of an 
L-shaped piece. Each piece extends the square by one unit 
on each side. Adding n terms means placing n pieces and 
making an n x n square. [Or is it an (n — 1) x (n — 1) square?] 
The sum is the area of the square, which is n 2 . Once you 
understand this picture, you never forget why adding the 
first n odd numbers gives the perfect square n 2 . 
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4.2 Geometric sums 

Here is a familiar series: 

1 1 1 
*= 1+ 2 + 4 + 8 + -"- 



2008-03-06 13:24:47 / rev ebd336097912+ 



Cite as: Sanjoy Mahajan, course materials for 18.098 / 6.099 Street- Fighting Mathematics, IAP 2008. 
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. 
Downloaded on [DD Month YYYY] . 



4.3 Arithmetic mean-geometric mean inequality 



47 



The usual symbolic way to evaluate the sum is with the formula for a geo- 
metric series. You can derive the formula using a trick. First compute 25 by 
multiplying each term by 2: 

25 = 2 + 1+1 + 1 + 1 + -. 

s v ' 

s 

This sum looks like 5, except for the first term 2. So 25 = 2 + 5 and 5 = 2. 



The result, though correct, may seem like magic. Here then is a 
picture proof. A square with unit area represents the first term, which 
is 1/2° (and is labelled 0). The second term is a 1 x 1/2 rectangle 
representing 1/2 1 (and is labelled against by the exponent 1). The 
third term is a 1/2 x 1/2 square placed in the nook. The fourth term 
is, like the second term, a rectangle. With every pair of terms, the 
empty area between all the rectangles and three-quarters of the 1x2 
outlining rectangle fills in. In the limit, the sum fills the 1x2 rectangle, 
showing that 5 = 2. 



I— 1 


3 


5^ 
4 


2 






4.3 Arithmetic mean-geometric mean inequality 

A classic inequality is the arithmetic mean-geometric mean inequality Here 
are a few numerical examples before the formal statement. Take two numbers, 
say, 1 and 2. Their arithmetic mean is 1.5. Their geometric mean is \f\ x 2 = 
1.414 .... Now try the same operations with 2 and 3. Their arithmetic mean 
is 2.5, and their geometric mean is \/2 x 3 = 2.449 .... In both cases, the 
geometric mean is smaller than the arithmetic mean. This pattern is the 
theorem of the arithmetic mean and geometric mean. It says that when 
a, b > 0, then 




where AM means arithmetic mean and GM means geometric mean. 

It has at least two proofs: symbolic and pictorial. A picture proof is 
hinted at by the designation of \fah as the geometric mean. First, however, 
I prove it symbolically. Look at (a — b) 2 . Since it is a square, 
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{a-bf > 0. 



Expanding the left side gives a 2 
adding Aab to both sides to get 



2ab + b 2 > 0. Now do the magic step of 



a 2 + 2ab + b 2 > Aab. 

The left side is again a perfect square, whose perfection suggests taking the 
square root of both sides to get 

a + b > 2\/ab. 

Dividing both sides by 2 gives the theorem: 

a + b 




Maybe you agree that, although each step is believable (and correct), the 
sequence of all of them seems like magic. The little steps do not reveal the 
structure of the argument, and the why is still elusive. For example, if the 
algebra steps had ended with 

a + b 



<± 

it would not have seemed obviously wrong. We would like a proof whose 
result could not have been otherwise. 

Here then is a picture proof. Split the 
diameter of the circle into the lengths a 
and b. The radius is (a + b)/2, which is 
the arithmetic mean. Now we need to find 
the geometric mean, whose name is auspi- 
cious. Look at the second half chord rising 
from the diameter where a and b meet. It is 
also the height of the dotted triangle, and 
that triangle is a right triangle. With right 
triangles everywhere, similar triangles must come in handy. Let the so-far- 
unknown length be x. By similar triangles, 

x b 
a x ' 
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so x = Vab, showing that the half chord is the geometric mean. That 
half chord can never be greater than the radius, so the geometric mean is 
never greater than the arithmetic mean. For the two means to be equal, the 
geometric-mean half chord must slide left to become the radius, which hap- 
pens only when a = b. So the arithmetic mean equals the geometric mean 
when a = b. 

Compare this picture proof with the symbolic proof. The structure of 
the picture proof is there to see, so to speak. The only non-obvious step is 
showing that the half chord is the geometric mean \/ab, the geometric mean. 
Furthermore, the picture shows why equality between the two means results 
only when a — b: Only then does the half chord become the radius. 

Here are two applications of the AM-GM inequality to problems from 
introductory calculus that one would normally solve with derivatives. In the 
first problem, you get I = 40 m of fencing to mark off a rectangular garden. 
What dimensions does the garden have in order to have the largest area? If 
a is the length and b is the width, then I = 2(a + b), which is 4 x AM. The 
area is ab, which is (GM) 2 . Since AM > GM, the consequence in terms of 
this problem's parameters is 

AM = l -> Tarea = GM. 

Since the geometric mean cannot be larger than 1/4, which is constant, the 
geometric mean is maximized when when a = b. For maximum area, therefore 
choose a = b = 10 m and get A = 100 m 2 . 

The next example in this genre is a more difficult three- 
dimensional problem. Start with a unit square and cut out four 
identical corners, folding in the four edges to make an open- 
topped box. What size should the corners be to maximize the 
box volume? Call x the side length of the corner cutout. Each 
side of the box has length 1 — 2x and it has height x, so the 
volume is 

V = x(l-2x) 2 . 

For lack of imagination, let's try the same trick as in the previous problem. 
Two great mathematicians, George Polya and Gabor Szego, commented that, 
'An idea which can be used once is a trick. If it can be used more than once 
it becomes a method.' So AM-GM, if it helps solve the next problem, gets 
promoted from a mere trick to the more exalted method. 











base 




_ 


flap | 
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In the previous problem, the factors in the area were a and b, and their 
sum a+b was constant because it was fixed by the perimeter. Then we could 
use AM-GM to find the maximum area. Here, the factors of the volume are 
x, 1 — 2x, and 1 — 2x. Their sum is 2 — 3x, which is not a constant; instead it 
varies as x changes. This variation means that we cannot apply the AM-GM 
theorem directly. The theorem is still valid but it does not tell us what we 
want to know. We want to know the largest possible volume. And, directly 
applied, the theorem says that the volume is never less than the cube of the 
arithmetic mean. Making the volume equal to this value does not guarantee 
that the maximum volume has been found, because the arithmetic mean is 
changing as one changes x to maximize the geometric mean. The largest 
volume may result where the GM is not equal to the changing AM. In the 
two-dimensional problem, this issue did not arise because the AM was already 
constant (it was a fixed fraction of the perimeter). 

If only the factor of x were a Ax, then the 3x would disappear when 
computing the AM: 

Ax + (1 - 2x) + (1 - 2x) = 2. 

As Captain Jean-luc Picard of The Next Generation says, 'Make it so.' You 
can produce a Ax instead of an x by studying AV instead of V: 

AV = Ax x 1 - 2x x 1 - 2x. 

The sum of the factors is 2 and their arithmetic mean is 2/3 - which is 
constant. The geometric mean of the three factors is 

(4x(l - 2a;) (1 - 2x)) 1/3 = (AV) 1/3 . 
So by the AM-GM theorem: 

AM = 2 > (AV) 1/3 = GM, 
3 



1 /2\ 3 2 



so 



V < , , 

~ 4 \3 J 27 

The volume equals this constant maximum value when the three factors Ax, 
1 — 2x, and 1 — 2x are equal. This equality happens when x — 1/6, which is 
the size of the corner cutouts. 
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4.4 Logarithms 



Pictures explain the early terms in many Taylor-series approximations. As 
an example, I derive the first two terms for ln(l + x). The logarithm function 
is defined as an integral 

' x dt 
t ' 



ln(l + x) 



-i: 



An integral, especially a definite integral, suggests an area as its 
picture. As a first approximation, the logarithm is the area of 
the shaded, circumscribed rectangle. The rectangle, although it 
overestimates the integral, is easy to analyze: Its area is its width 
(which is x) times its height (which is 1). So the area is x. This 
area is the first pictorial approximation, and explains the first 
term in the Taylor series 

ln(l + x) = x — ■ ■ ■ . 

An alternative to overestimating the integral is to underesti- 
mate it using the inscribed rectangle. Its width is still x but its 
height is 1/(1 + x). For small x, 

Ri 1 — X, 




1 + x 




1 + 

as you can check by multiplying both sides by 1 + x: 

\k\-x 2 . 

This approximation is valid when x 2 is small, which happens when x is small. 
Then the rectangle's height is 1 — x and its area is x(l — x) = x — x 2 . 

For the second approximation, average the over- and under 
estimate: 



1 + X 



ln(l + x) = area : 



x + (x — x 2 ) 



= x — 



x 

Y 



These terms are the first two terms in the Taylor series for ln(l + 
x). The picture for this symbolic average is a trapezoidal area, 
so this series of pictures explains the first two terms. Its error 
lies in making the smooth curve 1/t into a straight line, and this 




1 + X 



error produces the higher-order terms in the series 
compute just using pictures. 



but they are difficult to 
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Alternatively you can derive all the terms from the binomial theorem and 
the definition of the logarithm. The logarithm is 

, „ \ r 1+x dt r i , 

The binomial theorem says that 



l + t 

so 



= 1 - t + t 2 - t 3 



ln(l + x)= [ (1 -t + t 2 -t 3 + ■■■)&. 
Jo 

Now integrate term by term; although this procedure produces much gnashing 
of the teeth among mathematicians, it is usually valid. To paraphrase a motto 
of the Chicago police department, 'Integrate first, ask questions later.' Then 



ln(l + x) = x 



x 2 x 3 x 4 



The term-by-term integration shows you the entire series. Understand both 
methods and you will not only remember the logarithm series but will also 
understand two useful techniques. 

As an application of the logarithm approximation, I estimate In 2. A 
quick application of the first two terms of the series gives: 

, x 2 \ -, 1 1 

IT1(1 + X)~X — = 1 — - = -. 

v ' 2 lx=1 2 2 

That approximation is lousy because x is 1, so squaring x does not help 
produce a small x 2 /2 term. A trick, however, improves the accuracy. Rewrite 
In 2 as 

ln2 = ln-^ = ln- - In 2/3. 
2/3 3 ' 

Then approximate In (4/3) as ln(l + a;) with x = 1/3 and approximate ln(2/3) 
as ln(l + x) with x = —1/3. With x — ±1/3, squaring x produces a small 
number, so the error should shrink. Try it: 



4 , 1 1/1 

ln 3 =ln(l + x)L 1/ 3- 3 - 2 -( v 3 

2 11/1 
ln-=ln(l + x) 1/0 w ■ -- 

3 v -r ;| x= _i/ 3 3 2 V 3 
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When taking the difference, the quadratic terms cancel, so 

ln2 = ln^ - In? « \ = 0.666.... 
3 3 3 

The true value is 0.697 . . ., so this estimate is accurate to 5%! 



4.5 Geometry 

The following pictorial problem has a natural pictorial solution: 



How do you cut an equilateral triangle into two equal halves using the 
shortest, not-necessarily-straight path? 



Here are several candidates among the infinite set of possibilities for the path. 



A 

l = l/y/2 




I = V3/2 





I = 1 



I = (a mess) 



Let's compute the lengths of each bisecting path, with length measured in 
units of the triangle side. The first candidate encloses an equilateral triangle 
with one-half the area of the original triangle, so the sides of the smaller, 
shaded triangle are smaller by a factor of \/2. Thus the path, being one of 
those sides, has length 1/^/2. In the second choice, the path is an altitude 
of the original triangle, which means its length is VS/2, so it is longer than 
the first candidate. The third candidate encloses a diamond made from two 
small equilateral triangles. Each small triangle has one- fourth the area of the 
original triangle with side length one, so each small triangle has side length 
1/2. The bisecting path is two sides of a small triangle, so its length is 1. 
This candidate is longer than the other two. 

The fourth candidate is one-sixth of a circle. To find its length, find the 
radius r of the circle. One-sixth of the circle has one-half the area of the 
triangle, so 



-^circle 



6 X -Arianglc = 6x-X-XlX — 
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Multiplying the pieces gives 

irr 2 

and 




The bisection path is one-sixth of a circle, so its length is 

27rr _ tt /3\/3 _ Itts/2, 
~ ~6~ ~ 3 V ~ V ~L2~' 

The best previous candidate (the first picture) has length l/\/2 = 0.707. . .. 
Does the mess of 7r and square roots produce a shorter path? Roll the 
drums. . . : 




1 = 0.67338..., 

which is less than 1 / y/2. So the circular arc is the best bisection path so far. 
However, is it the best among all possible paths? The arc-length calculation 
for the circle is messy, and most other paths do not even have a closed form 
for their arc lengths. 

Instead of making elaborate calculations, try a familiar method, 
symmetry, in combination with a picture. Replicate the triangle 
six times to make a hexagon, and also replicate the candidate 
path. Here is the result of replicating the first candidate (the 
bisection line going straight across). The original triangle be- 
comes the large hexagon, and the enclosed half-triangle becomes 
a smaller hexagon having one-half the area of the large hexagon. 

Compare that picture with the result of replicating the circular- 
arc bisection. The large hexagon is the same as for the last repli- 
cation, but now the bisected area replicates into a circle. Which 
has the shorter perimeter, the shaded hexagon or this circle? The 
isoperimetric theorem says that, of all figures with the same 
area, the circle has the smallest perimeter. Since the circle and 
the smaller hexagon enclose the same area which is three times 
the area of one triangle - the circle has a smaller perimeter than the hexagon, 
and has a smaller perimeter than the result of replicating any other path! 
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4.6 Summing series 



Now let's look for a second time at Stirling's approximation to n factorial. In 
Section 3.5, we found it by approximating the integral 



t n e- t dt = n\. 



The next method is also indirect, by approximat- 
ing In 7i !: 



In nl — In k. 



1 2 3 4 5 6 7 

k 



This sum is the area of the rectangles. That area is 
roughly the area under the smooth curve In k. This 
area is 



Ink dk = kin k — k — nlnn — n+1. 



Before making more accurate approximations, let's see how this one is doing 
by taking the exponential to recover n\: 



n 

e" 



x e. 



The n n and the 1/e" factors are already correct. The next pictorial correction 
make the result even more accurate. 

The error in the integral approximation come from 
the pieces protruding beyond the In k curve. To ap- 
proximate the area of these protrusions, pretend that 
they are triangles. If lnfc were made of linear seg- 
ments, there would be no need to pretend; even so 
the pretense is only a tiny lie. The problem become 
one of adding up the shaded triangles. 

The next step is to double the triangles, turning 
them into rectangles, and remembering to repay the 
factor of 2 before the end of the derivation. 



1 2 3 4 5 6 7 




2008-03-06 13:24:47 / rev ebd336097912+ 



Cite as: Sanjoy Mahajan, course materials for 18.098 / 6.099 Street- Fighting Mathematics, IAP 2008. 
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. 
Downloaded on [DD Month YYYY]. 



Picture proofs 



56 



The final step is to hold your right hand at the 
x = 7 line to catch the shaded pieces as you shove 
them rightward with your left hand. They stack to 
make the In 7 rectangle. So the total overshoot, after 
paying back the factor of 2, is (ln7)/2. For general 
n, the overshoot is (lnn)/2. The integral J" In kdk 
provides nlnn — n (from the upper limit) and 1 from 
the lower limit. So the integral and graph together produce 




Inn! w n Inn 



or 



Stirling's formula is 



In n 
protrusions 



\e) 



The difference between the pictorial approximation and Stirling's formula is 
the factor of e that should be y/2ir. Except for this change of only 8%, a 
simple integration and graphical method produce the whole formula. 

The protrusion correction turns out to be the first term in an infinite series 
of corrections. The later corrections are difficult to derive using pictures, just 
as the later terms in the Taylor series for ln(l + x) are difficult to derive 
by pictures (we used integration and the binomial theorem for those terms). 
But another technique, analogy, produces the higher corrections for Inn!. 
That analysis is the subject of Section 7.3, where the pictorial, protrusion 
correction that we just derived turns out to be the zeroth-derivative term in 
the Euler-MacLaurin summation formula. 
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